This is a project using ‘R’ language developed by Felipe Solares da Silva. This is part of his professional portfolio and if you want to see more projects like this, go and check my portfolio at https://github.com/fsolares/professional-portfolio.
Contact: solares.fs@gmail.com
Thank you Jaques D’Erasmo (https://github.com/Jaquesd), old friend and also Data Science student for your immeasurable contribution on this project, all your feedback, code sujection and support during my path gave me the strength to overcome this challenge. Congratualation for us, that was an ammazing and real team work.
Build interactive graphs in order to map Zika virus occurences in Brazilian territory.
If you don’t have any of these packages already installed in your rstudio please, run the code below!
chooseCRANmirror(graphics=FALSE, ind=1) # No need to run this part,
# it's a R Markdown correction.
packs <- c('forcats', 'plotly', 'RColorBrewer', 'tmap', 'rnaturalearthdata',
'sf','ggplot2','dplyr')
for(p in packs){
install.packages(p)
}
If you have some of this packages, please run the code below giving the name of the package that are missing for you!
chooseCRANmirror(graphics=FALSE, ind=1) # No need to run this part,
# it's a R Markdown correction.
install.packages('your missing package')
After the packages are installed, make sure to run the code below to import the modules and make the required connections.
chooseCRANmirror(graphics=FALSE, ind=1) # No need to run this part,
# it's a R Markdown correction.
lapply(packs, require, character.only = T)
For this project, we gather information from PAHO - Pan American Health Organization (http://www.paho.org/data/index.php/en/?option=com_content&view=article&id=532&Itemid=) to build our ow data sets. PAHO organizations is the specialized international health agency for the Americas. It works with countries throughout the region to improve and protect people’s health. PAHO engages in technical cooperation with its member countries to fight communicable and noncommunicable diseases and their causes, to strengthen health systems, and to respond to emergencies and disasters. They provide data in different formats aquired from Health Ministries of the countries through Health Information Platfform for the Americas (PLISA). After lots of cleaning and transforming, we structure the data and store it into csv files for our futher analysis.
df1 <- read.csv("W_Casos_semanales_crosstab.csv", header = T, sep = ',', stringsAsFactors = F)
df2 <- read.csv("W_Casos_semanales_crosstab2017.csv", header = T, sep = ',', stringsAsFactors = F)
df3 <- read.csv("total_incidence.csv", header = T, sep = ',', stringsAsFactors = F)
df4 <- read.csv("total_cases.csv", header = T, sep = ',', stringsAsFactors = F)
df5 <- read.csv("linegraph_2018vs2017.csv", header = T, sep = ',', stringsAsFactors = F)
df6 <- read.csv("bargraph_2018vs2017.csv", header = T, sep = ',', stringsAsFactors = F)
First, we’re going to convert a Spatial Polygons data frame to a sf (simple feature) object using st_as_sf. The data frame in question is the states50, from rnaturalearthdata package that brings state (admin level 1) polygons For Australia, Brazil, Canada and USA, at 1:50m (medium) resolution.
states <- st_as_sf(states50)
Next, we’re going to use dplyr functions to organize and prepare a new set for plotting.
brazil <- states %>%
filter(admin == 'Brazil') %>%
select(state = name, geometry) %>%
arrange(state)
brazil$state <- df4$state
colnames(df3)[colnames(df3) == 'State'] <- 'state'
brazil <- brazil %>%
left_join(df3, by = 'state') %>%
left_join(df4, by = 'state') %>%
select(state, Total.Incidence = Total.Cumulative.Cases.Incidence, Total.Cases = total.cases)
Incidence or incidence coefficient measures the rate of manifestation of a particular disease. Is calculated using the number of likely new cases divided by the population of a given geographical area, and expressed per 100 thousand inhabitants. Source: https://www.diferenca.com/incidencia-e-prevalencia/
tmap_mode("view")
tm_shape(brazil) +
tm_polygons("Total.Incidence", title = "Total Incidence") +
tm_scale_bar()
Prevalence or Cases measures the number of ocurrences of a disease in a population over a specific period of time.
tm_shape(brazil) +
tm_polygons("Total.Cases", title = "Total Cases", breaks = c(0,15000,50000,75000,150000)) +
tm_scale_bar()
# Line plot
df5$label <- as.factor(df5$label)
plot <- df5 %>%
ggplot(aes(x = weeks, y = totalperweek)) +
geom_line(aes(colour = label), size = 1.3) +
geom_point(aes(colour = label), size = 0.8) +
theme(plot.title = element_text(face = "bold"),
plot.caption = element_text(face = "bold"),
panel.background = element_blank(),
axis.text.x = element_text(size = 7),
axis.title = element_text(face = "bold",size = 12),
axis.line.x = element_line(colour = "black",
size=1,
lineend = "butt"),
axis.line.y = element_line(colour = "black",
size=1,
lineend = "butt"),
legend.title = element_text(size=10, color = "black", face="bold"),
legend.position= c(0.85, 0.5),
legend.background = element_blank(),
legend.key = element_blank()) +
labs(title="Weekly Cases", subtitle="2018 vs 2017",
caption = 'Source: PAHO - Pan American Health Organization',
y="Total Cases/Week", x="Epidemiological Weeks", color = 'Years') +
scale_x_continuous(breaks = c(1:52), expand=c(0.009, 0))
ggplotly(plot)
# Bar plot
df6$year <- as.factor(df6$year)
plot2 <- df6 %>%
ggplot(aes(x = state, y = cumulative.total)) +
geom_bar(aes(fill = year), stat = 'identity') +
theme(plot.title = element_text(face = "bold"),
plot.caption = element_text(face = "bold"),
panel.background = element_blank(),
axis.title.y = element_text(face = "bold", size = 12, vjust = 3),
axis.title.x = element_blank(),
axis.line.x = element_line(colour = "black",
size=1,
lineend = "butt"),
axis.line.y = element_line(colour = "black",
size=1,
lineend = "butt"),
axis.text.x=element_text(size=10, angle = 90,
hjust=1),
legend.title = element_text(size=10, color = "black", face="bold"),
legend.position= c(0.85, 0.7),
legend.background = element_blank(),
legend.key = element_blank()) +
labs(title="Yearly Prevalence", subtitle="2018 vs 2017",
caption = 'Source: PAHO - Pan American Health Organization',
y="Total Cases/State", fill = 'Years') +
scale_fill_brewer(palette = "Set1")
ggplotly(plot2)
df3$state <- as.factor(df3$state)
plot3 <- df3 %>%
mutate(state = fct_reorder(state, Weekly.Cases.Incidence, .desc = T)) %>%
ggplot(aes(x = state, y = Weekly.Cases.Incidence)) +
geom_bar(stat = 'identity',width = 0.8, fill = "#E69F00") +
geom_text(aes(label = Weekly.Cases.Incidence), position = position_dodge(width = 1),
size = 3.5, vjust = -1)+
theme(plot.title = element_text(face = "bold", vjust = 2),
plot.caption = element_text(face = "bold"),
plot.subtitle = element_text(vjust = 3),
panel.background = element_blank(),
axis.title.y = element_text(face = "bold", size = 12, vjust = 3),
axis.title.x = element_blank(),
axis.line.x = element_line(colour = "black",
size=1,
lineend = "butt"),
axis.line.y = element_line(colour = "black",
size=1,
lineend = "butt"),
axis.text.x = element_text(size=10, angle = 90,
hjust=1),
legend.title = element_text(size=10, color = "black", face="bold"),
legend.position = c(0.85, 0.7),
legend.background = element_blank(),
legend.key = element_blank()) +
labs(title="Cumulative Incidence", subtitle="per State",
caption = 'Source: PAHO - Pan American Health Organization',
y="Cumulative Incidence")
ggplotly(plot3, tooltip = 'all') %>%
style(hoverinfo ='none')